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ABSTRACT 

We describe an end-to-end real-time S&P futures trading system. Inner-shell stochastic nonlinear 
dynamic models are developed, and Canonical Momenta Indicators (CMI) are derived from a fitted 
Lagrangian used by outer-shell trading models dependent on these indicators. Recursive and adaptive 
optimization using Adaptive Simulated Anneahng (ASA) is used for fitting parameters shared across 
these shells of dynamic and trading models. 
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1. INTRODUCTION 
1.1. Approaches 

Real-world problems are almost intractable analytically, yet methods must be devised to deal with 
this complexity to extract practical information in finite time. This is indeed true in the field of financial 
engineering, where time series of various financial instruments reflect nonequilibrium, highly non-hnear, 
possibly even chaotic [1] underlying processes. A further difficulty is the huge amount of data necessary 
to be processed. Under these circumstances, to develop models and schemes for automated, profitable 
trading is a non-trivial task. 

In the context of this paper, it is important to stress that dealing with such complex systems 
invariably requires modeling of dynamics, modeling of actions on these dynamics, and algorithms to fit 
parameters in these models to real data. We have elected to use methods of mathematical physics for our 
models of the dynamics, artificial intelligence (AI) heuristics for our models of trading rules acting on 
indicators derived from our dynamics, and methods of sampling global optimization for fitting our 
parameters. Too often there is confusion about how these three elements are being used for a complete 
system. For example, in the literature there often is discussion of neural net trading systems or genetic 
algorithm trading systems. However, neural net models (used for either or both models discussed here) 
also require some method of fitting their parameters, and genetic algorithms must have some kind of cost 
function or process specified to sample a parameter space, etc. 

Some powerful methods have emerged during years, appearing from at least two directions: One 
direction is based on inferring rules from past and current behavior of market data leading to learning- 
based, inductive techniques, such as neural networks, or fuzzy logic. Another direction starts from the 
bottom-up, trying to build physical and mathematical models based on different economic prototypes. In 
many ways, these two directions are complementary and a proper understanding of their main strengths 
and weaknesses should lead to synergetic effects beneficial to their common goals. 

Among approaches in the first direction, neural networks already have won a prominent role in the 

financial community, due to their ability to handle large quantities of data, and to uncover and model 
nonlinear functional relationships between various combinations of fundamental indicators and price 
data [2,3]. 
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In the second direction we can include models based on non-equilibrium statistical mechanics [4] 
fractal geometry [5], turbulence [6], spin glasses and random matrix theory [7], renormalization group [8], 
and gauge theory [9]. Although the very complex nonlinear multivariate character of financial markets is 
recognized [10], these approaches seem to have had a lesser impact on current quantitative finance 
practice, although it is becoming increasing clear that this direction can lead to practical trading strategies 
and models. 

To bridge the gap between theory and practice, as well as to afford a comparison with neural 
networks techniques, here we focus on presenting an effective trading system of S&P futures, anchored in 
the physical principles of nonequilibrium statistical mechanics applied to financial markets [4,1 1]. 

Starting with nonhnear, multivariate, nonlinear stochastic differential equation descriptions of the 
price evolution of cash and futures indices, we build an algebraic cost function in terms of a Lagrangian. 
Then, a maximum likelihood fit to the data is performed using a global optimization algorithm, Adaptive 
Simulated Anneahng (ASA) [12]. As firmly rooted in field theoretical concepts, we derive market 
canonical momenta indicators, and we use these as technical signals in a recursive ASA optimization that 
tunes the outer-shell of trading rules. We do not employ metaphors for these physical indicators, but 
rather derive them directly from models fit to data. 

The outline of the paper is as follows: Just below we briefly discuss the optimization method and 
momenta indicators. In the next three sections we establish the theoretical framework supporting our 
model, the statistical mechanics approach, and the optimization method. In Section 5 we detail the 
trading system, and in Section 6 we describe our results. Our conclusions are presented in Section 7. 

1.2. Optimization 

Large-scale, non-linear fits of stochastic nonlinear forms to financial data require methods robust 
enough across data sets. (Just one day, tick data for regular trading hours could reach 10,000-30,000 data 
points.) Simple regression techniques exhibit deficiencies with respect to obtaining reasonable fits. They 
too often get trapped in local minima typically found in nonlinear stochastic models of such data. ASA is 
a global optimization algorithm that has the advantage — with respect to other global optimization 
methods as genetic algorithms, combinatorial optimization, etc. — not only to be efficient in its 
importance-samphng search strategy, but to have the statistical guarantee of finding the best 
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optima [13,14]. This gives some confidence that a global minimum can be found, of course provided care 
is taken as necessary to tune the algorithm [15]. 

It should be noted that such powerful sampling algorithms also are often required by other models 
of complex systems than those we use here [16]. For example, neural network models have taken 
advantage of ASA [17-19], as have other financial and economic studies [20,21]. 

1.3. Indicators 

In general, neural network approaches attempt classification and identification of patterns, or try 
forecasting patterns and future evolution of financial time series. Statistical mechanical methods attempt 
to find dynamic indicators derived from physical models based on general principles of non-equihbrium 
stochastic processes that reflect certain market factors. These indicators are used subsequently to generate 
trading signals or to try forecasting upcoming data. 

In this paper, the main indicators are called Canonical Momenta Indicators (CMI), as they faithfully 
mathematically carry the significance of market momentum, where the "mass" is inversely proportional to 
the price volatility (the "masses" are just the elements of the metric tensor in this Lagrangian formalism) 
and the "velocity" is the rate of price changes. 

2. MODELS 

2.1. Langevin Equations for Random Walks 

The use of Brownian motion as a model for financial systems is generally attributed to 
Bachelier [22], though he incorrectly intuited that the noise scaled linearly instead of as the square root 
relative to the random log-price variable. Einstein is generally credited with using the correct 
mathematical description in a larger physical context of statistical systems. However, several studies 
imply that changing prices of many markets do not follow a random walk, that they may have long-term 
dependences in price correlations, and that they may not be efficient in quickly arbitraging new 
information [23-25]. A random walk for returns, rate of change of prices over prices, is described by a 
Langevin equation with simple additive noise t], typically representing the continual random influx of 
information into the market. 
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M = -f + gr], 
M = dMIdt , 

< Tj(t) >r,= 0,< Tj(t), Tj(t') >^= 5{t - t') , (1) 

where / and g are constants, and M is the logarithm of (scaled) price, M(t) = \og{P{t)IP{t - dt)). Price, 
although the most dramatic observable, may not be the only appropriate dependent variable or order 

parameter for the system of markets [26]. This possibility has also been called the "semistrong form of 
the efficient market hypothesis" [23]. 

The generalization of this approach to include multivariate nonlinear nonequihbrium markets led to 
a model of statistical mechanics of financial markets (SMFM) [11]. 

2.2. Adaptive Optimization of Models 

Our S&P model for the futures F is 

dF = /u.dt + aF^dz , 

<dz>=0 , 

< dz{t) dz{t') > = dtd(t- t') 

We have used this model in several ways to fit the distribution's volatihty defined in terms of a scale 
and an exponent of the independent variable [4]. 

A major component of our trading system is the use of adaptive optimization, essentially constantly 
retuning the parameters of our dynamic model each time new data is encountered in our training, testing 
and real-time applications. The parameters {//,cr} are constantly tuned using a quasi-local simplex 
code [27,28] included with the ASA (Adaptive Simulated Annealing) code [12]. 

We have tested several quasi-local codes for this kind of trading problem, versus using robust ASA 
adaptive optimizations, and the faster quasi-local codes seem to work quite well for adaptive updates after 
a zeroth order parameters set is found by ASA [29,30]. 
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3. STATISTICAL MECHANICS OF FINANCIAL MARKETS (SMFM) 

3.1. Statistical Mechanics of Large Systems 

Aggregation problems in nonlinear nonequilibrium systems typically are "solved" (accommodated) 
by having new entities/languages developed at these disparate scales in order to efficiently pass 
information back and forth between scales. This is quite different from the nature of quasi-equihbrium 
quasi-linear systems, where thermodynamic or cybernetic approaches are possible. These thermodynamic 
approaches typically fail for nonequilibrium nonlinear systems. 

Many systems are aptly modeled in terms of multivariate differential rate-equations, known as 
Langevin equations [31], 

= dM^ldt , 

< ri\t) >,= , < 77^(0, rii'if) >^= S^^' S{t - f) , (2) 

where and g*^ are generally nonlinear functions of mesoscopic order parameters M^, j is a 
microscopic index indicating the source of fluctuations, and N > A. The Einstein convention of summing 
over repeated indices is used. Vertical bars on an index, e.g., Ijl, imply no sum is to be taken on repeated 
indices. 

Via a somewhat lengthy, albeit instructive calculation, outlined in several other papers [11,32,33], 
involving an intermediate derivation of a corresponding Fokker-Planck or Schrodinger-type equation for 
the conditional probabihty distribution P[M(t)\M(to)], the Langevin rate Eq. (2) is developed into the 
more useful probability distribution for at long-time macroscopic time event t = (u + 1)0 + 1^, in 
terms of a Stratonovich path-integral over mesoscopic Gaussian conditional probabilities [34-38]. Here, 
macroscopic variables are defined as the long-time limit of the evolving mesoscopic system. 

The corresponding Schfodinger-type equation is [36,37] 

gGG' = krSj'g'lgf , 
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[•••],G = a[---]/aM^. (3) 

This is properly referred to as a Fokker-Planck equation when V = 0. Note that although the partial 
differential Eq. (3) contains information regarding M as in the stochastic differential Eq. (2), all 
references to j have been properly averaged over. I.e., in Eq. (2) is an entity with parameters in both 
microscopic and mesoscopic spaces, but M is a purely mesoscopic variable, and this is more clearly 
reflected in Eq. (3). 

The path integral representation is given in terms of the "Feynman" Lagrangian L. 
P[Mt\M,^\dM{t) = j • • • j DM exp(-5)J[M(?o) = Mo]<5[M(0 = M,] , 

t 

S = kj^ min J dt'L , 
to 

u+\ 

DM= lim Ylg^'^Ylilney^'^dM^ , 

«^°° v=l G 

L{M^, M^, t)=^- (M^ - h^)gGG'{M^ - h^') + ^ h^-G + R/6-V , 

gGG'=ig''''y' , 

g = det(gGG') ' 

« ;G = «,G + 1 GF« =8 is n ) g , 

^JK = g'^'^UK, L] = g'^''{gjL,K + gKL,J - gjK,L) , 

R = g-"^RjL = g'^^g'^RFML , 

RpJKL = - igPKJL ~ gjK,FL ~ gpLJK + gjL,FK) + gMN^FK^^L ~ ^^F^) . (4) 
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Mesoscopic variables have been defined as M*^ in the Langevin and Fokker-Planck representations, in 
terms of their development from the microscopic system labeled by j. The Riemannian curvature term R 
arises from nonlinear gcG'' which is a bona fide metric of this space [36]. Even if a stationary solution, 

' G ' G 

i.e., M = 0, is ultimately sought, a necessarily prior stochastic treatment of M terms gives rise to these 
Riemannian "corrections." Even for a constant metric, the term h^-^Q contributes to L for a nonlinear 
mean may include terms such as J^Jt'g^^^ where the Lagrange multipliers Jj-'g constraints 

T' 

on M^, which are advantageously modeled as extrinsic sources in this representation; they too may be 
time-dependent. 

For our purposes, the above Feynman Lagrangian defines a kernel of the short-time conditional 
probability distribution, in the curved space defined by the metric, in the limit of continuous time, whose 
iteration yields the solution of the previous partial differential equation Eq. (3). This differs from the 
Lagrangian which satisfies the requirement that the action is stationary to the first order in dt — the 
WKBJ approximation, but which does not include the first-order correction to the WKBJ approximation 
as does the Feynman Lagrangian. This latter Lagrangian differs from the Feynman Lagrangian, 
essentially by replacing R/6 above by R/12 [39]. In this sense, the WKBJ Lagrangian is more useful for 
some theoretical discussions [40]. However, the use of the Feynman Lagrangian coincides with the 
numerical method we use for long-time development of our distributions using our PATHINT code for 
other financial products, e.g., options [4]. This also is consistent with our use of relatively short-time 
"forecast" of data points using the most probable path [41] 

dM'^/dt = - g'%-"^g''''\G' . (5) 

Using the variational principle, Jjq may also be used to constrain M to regions where they are 
empirically bound. More complicated constraints may be affixed to L using methods of optimal control 
theory [42]. With respect to a steady state P, when it exists, the information gain in state P is defined by 

T[P] =|---|dm' Pin (P/P) , 

DM' = DM/dM^+i . (6) 

In the economics literature, there appears to be sentiment to define Eq. (2) by the Ito, rather than the 
Stratonovich prescription. It is true that Ito integrals have Martingale properties not possessed by 
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Sttatonovich integrals [43] which leads to risk-neural theorems for markets [44,45], but the nature of the 
proper mathematics — actually a simple transformation between these two discretizations — should 
eventually be determined by proper aggregation of relatively microscopic models of markets. It should be 
noted that virtually all investigations of other physical systems, which are also continuous time models of 
discrete processes, conclude that the Stratonovich interpretation coincides with reality, when 
multiplicative noise with zero correlation time, modeled in terms of white noise rj^ , is properly considered 
as the limit of real noise with finite correlation time [46]. The path integral succinctly demonstrates the 
difference between the two: The Ito prescription corresponds to the prepoint discretization of L, wherein 
OM(t) My+i - My and M(t) My. The Stratonovich prescription corresponds to the midpoint 

discretization of L, wherein dM{t) — > M^+i - My and M{t) — > ^ (M„+i + My). In terms of the functions 

appearing in the Fokker-Planck Eq. (3), the Ito prescription of the prepoint discretized Lagrangian, L/, is 
relatively simple, albeit deceptively so because of its nonstandard calculus. 

Lj(m'\ M^, = ^ (m"" - g^)gGG'(M''' - g^') -V. (7) 

In the absence of a nonphenomenological microscopic theory, the difference between a Ito prescription 
and a Stratonovich prescription is simply a transformed drift [39]. 

There are several other advantages to Eq. (4) over Eq. (2). Extrema and most probable states of 
M^, « M^ 5>, are simply derived by a variational principle, similar to conditions sought in previous 
studies [47]. In the Stratonovich prescription, necessary, albeit not sufficient, conditions are given by 

SqL = L q - L q.j = , 

LG:t = LGG'M''' + LGG'M''' . (8) 

' G ~ ~ G ~ G 

For stationary states, M =0, and dL/dM = defines <sC M », where the bars identify stationary 
variables; in this case, the macroscopic variables are equal to their mesoscopic counterparts. Note that L 
is not the stationary solution of the system, e.g., to Eq. (3) with dP/dt = 0. However, in some cases [48], 
L is a definite aid to finding such stationary states. Many times only properties of stationary states are 

• G 

examined, but here a temporal dependence is included. E.g., the M terms in L permit steady states and 
their fluctuations to be investigated in a nonequilibrium context. Note that Eq. (8) must be derived from 
the path integral, Eq. (4), which is at least one reason to justify its development. 



Optimization of Trading 



- 10- 



Ingber & Mondescu 



3.2. Algebraic Complexity Yields Simple Intuitive Results 

It must be emphasized that the output of this formalism is not confined to complex algebraic forms 
or tables of numbers. Because L possesses a variational principle, sets of contour graphs, at different 
long-time epochs of the path-integral of P over its variables at all intermediate times, give a visually 
intuitive and accurate decision-aid to view the dynamic evolution of the scenario. For example, this 
Lagrangian approach permits a quantitative assessment of concepts usually only loosely defined. 

8L 



'Momentum" = 11^ = 



"Mass" = gGG' = 



didM^ldt) ' 



'Force" = 



d{dM(^ldt)d{dM(^'ldt) ' 
dL 



dL d dL 

"F = ma": 5L = 0= ——— — ——— , (9) 

dMG dt didMGldt) 

where M are the variables and L is the Lagrangian. These physical entities provide another form of 
intuitive, but quantitatively precise, presentation of these analyses. For example, daily newspapers use 
some of this terminology to discuss the movement of security prices. In this paper, the 11^ serve as 
canonical momenta indicators (CMI) for these systems. 



3.2.1. Derived Canonical Momenta Indicators (CMI) 

The extreme sensitivity of the CMI gives rapid feedback on changes in trends as well as the 
volatility of markets, and therefore are good indicators to use for trading rules [29]. A time-locked 

moving average provides manageable indicators for trading signals. This current project uses such CMI 
developed as a byproduct of the ASA fits described below. 



3.3. Correlations 

In this paper we report results of our one-variable trading model. However, it is straightforward to 
include multi-variable trading models in our approach, and we have done this, for example, with coupled 
cash and futures S&P markets. 
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Correlations between variables are modeled explicitly in the Lagrangian as a parameter usually 
designated p. This section uses a simple two-factor model to develop the correspondence between the 
correlation p in the Lagrangian and that among the commonly written Wiener distribution dz. 

Consider coupled stochastic differential equations for futures F and cash C: 
dF = f{F, C)dt + f(F, OapdzF , 

dC = f{F, C)dt + f{F, C)acdzc , 

<dzi>=0,i = {F,C} , 

< dzi{t)dzj{t') >= dtS{t - 1') ,i = j, 

< dzi(t)dzj(t') >= pdtdit - 1') j , 

where < . > denotes expectations with respect to the multivariate distribution. 

These can be rewritten as Langevin equations (in the Ito prepoint discretization) 

dF/dt = f^ + g^crpir'^Vi + sgnp Y'Vi) , 
dC/dt = + facisgap y'tji + 7+772) , 

V2 

ni = (dtf^Pi, (11) 
where pi and P2 are independent [0,1] Gaussian distributions. 

The equivalent short-time probability distribution, P, for the above set of equations is 

P = g^'^iljtdt)-^'^ expi-Ldt) , 
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M = 



dFldt - f\ 
dCldt - f , 



g = det(g) . 



(12) 



g, the metric in {F, C}-space, is the inverse of the covariance matrix, 




(13) 



V 



J 



The CMI indicators are given by the formulas 



{dFldt - f) pidCldt - f) 



{fa^ni - p2) ffapacil - p^) 



{dCldt-f) p{dFldt-f) 



(14) 



{focKl - p^) ffacoA^ - p^) ■ 



3.4. ASA Outline 

The algorithm Adaptive Simulated Aimeahng (ASA) fits short-time probability distributions to 
observed data, using a maximum likelihood technique on the Lagrangian. This algorithm has been 
developed to fit observed data to a theoretical cost function over a D-dimensional parameter space [13], 
adapting for varying sensitivities of parameters during the fit. The ASA code can be obtained at no 
charge, via WWW from http://www.ingber.com/ or via FTP from ftp.ingber.com [12]. 

3.4.1. General Description 

Simulated annealing (SA) was developed in 1983 to deal with highly nonlinear problems [49], as an 
extension of a Monte-Carlo importance-sampling technique developed in 1953 for chemical physics 
problems. It helps to visualize the problems presented by such complex systems as a geographical terrain. 
For example, consider a mountain range, with two "parameters," e.g., along the North-South and 
East-West directions. We wish to find the lowest valley in this terrain. SA approaches this problem 
similar to using a bouncing ball that can bounce over mountains from valley to valley. We start at a high 
"temperature," where the temperature is an SA parameter that mimics the effect of a fast moving particle 
in a hot object like a hot molten metal, thereby permitting the ball to make very high bounces and being 
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able to bounce over any mountain to access any valley, given enough bounces. As the temperature is 
made relatively colder, the ball cannot bounce so high, and it also can settle to become trapped in 
relatively smaller ranges of valleys. 

We imagine that our mountain range is aptly described by a "cost function." We define probability 
distributions of the two directional parameters, called generating distributions since they generate possible 
valleys or states we are to explore. We define another distribution, called the acceptance distribution, 
which depends on the difference of cost functions of the present generated valley we are to explore and 
the last saved lowest valley. The acceptance distribution decides probabilistically whether to stay in a new 
lower valley or to bounce out of it. All the generating and acceptance distributions depend on 
temperatures. 

In 1984 [50], it was established that SA possessed a proof that, by carefully controlling the rates of 
cooling of temperatures, it could statistically find the best minimum, e.g., the lowest valley of our 
example above. This was good news for people trying to solve hard problems which could not be solved 
by other algorithms. The bad news was that the guarantee was only good if they were willing to run SA 
forever. In 1987, a method of fast annealing (FA) was developed [51], which permitted lowering the 
temperature exponentially faster, thereby statistically guaranteeing that the minimum could be found in 
some finite time. However, that time still could be quite long. Shortly thereafter, Very Fast Simulated 
Reannealing (VFSR) was developed in 1987 [13], now called Adaptive Simulated Aimealing (ASA), 
which is exponentially faster than FA. 

ASA has been applied to many problems by many people in many disciplines [15,16,52]. The 
feedback of many users regularly scrutinizing the source code ensures its soundness as it becomes more 
flexible and powerful. 

3.4.2. Mathematical Outline 

ASA considers a parameter or[ in dimension i generated at annealing-time k with the range 
aie[Ai,Bi], (15) 
calculated with the random variable y', 

4+1 = 4 + y'(Bi - Ai) , 



Optimization of Trading - 14 - Ingber & Mondescu 

/£[-!, 1]. (16) 
The generating function griy) is defined, 

gr(y)=n ^,, ^,,rj.. =Il8T(y') ^ (17) 

i=i 2(1/1 + r,)in(i + i/r,) 

where the subscript / on specifies the parameter index, and the ^-dependence in Tjik) for the annealing 
schedule has been dropped for brevity. Its cumulative probability distribution is 



Griy) = j • • • j • • • dy'" griy) = n G'riy) , 



-1 -1 

, sgn(y)in(i + iyi/r,) 

^Tiy)-^^^^ in(i + i/r,) • ^^^^ 

is generated from a m' from the uniform distribution 
M'ef/[0, 1] , 

= sgn («'• - ^)r,[(l + lITif"'-'^ - 1] . (19) 
It is straightforward to calculate that for an annealing schedule for T,- 

rK^) = ro,exp(-Q^i^^), (20) 

a global minima statistically can be obtained. I.e., 

oo oo D 

Control can be taken over c,-, such that 

Tfi = exp(-m,) when kf = expn,- , 

Ci = nti exTpi-rti/D) , (22) 

where m, and n, can be considered "free" parameters to help tune ASA for specific problems. 

ASA has over 100 OPTIONS available for tuning. A few important ones were used in this project. 
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3.4.3. Reannealing 

Whenever doing a multi-dimensional search in the course of a complex nonhnear physical problem, 
inevitably one must deal with different changing sensitivities of the or' in the search. At any given 
annealing-time, the range over which the relatively insensitive parameters are being searched can be 
"stretched out" relative to the ranges of the more sensitive parameters. This can be accomplished by 
periodically rescaling the anneahng-time k, essentially reannealing, every hundred or so acceptance- 
events (or at some user-defined modulus of the number of accepted or generated states), in terms of the 
sensitivities Sj calculated at the most current minimum value of the cost function, C, 

Si = dC/da' . (23) 

In terms of the largest 5, = ^max' ^ default rescaling is performed for each kj of each parameter dimension, 
whereby a new index k'i is calculated from each A:, , 

ki ^ k i , 

ik' ~ '^iki^max^^i) > 

k'i = (HTio/Tik'yCi)" . (24) 
r,o is set to unity to begin the search, which is ample to span each parameter dimension. 



3.4.4. Quenching 

Another adaptive feature of ASA is its ability to perform quenching in a methodical fashion. This 
is apphed by noting that the temperature schedule above can be redefined as 

Ti(ki) = Toi exp(-Cikf'"^) , 

Ci = Mi expi-riiQi/D) , (25) 
in terms of the "quenching factor" g, . The samphng proof fails if (2, > 1 as 

D 

X n ^/k^'"^ = L i/'t^' < o° ■ (26) 

k k 

This simple calculation shows how the "curse of dimensionality" arises, and also gives a possible 
way of hving with this disease. In ASA, the influence of large dimensions becomes clearly focussed on 
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the exponential of the power of k being l/D, as the annealing required to properly sample the space 
becomes prohibitively slow. So, if resources cannot be committed to properly sample the space, then for 
some systems perhaps the next best procedure may be to turn on quenching, whereby (2, can become on 
the order of the size of number of dimensions. 

The scale of the power of l/D temperature schedule used for the acceptance function can be altered 
in a similar fashion. However, this does not affect the annealing proof of ASA, and so this may used 
without damaging the sampling property. 

3.4.5. Avoiding Repeating Cost Functions 

Doing a recursive optimization is very CPU expensive, as essentially the cross-product of parameter 
spaces among the various levels of optimization is required. 

Therefore, we have used an ASA OPTION for some of the parameters in the outer-shell trading 
model optimization of training sets, ASA_QUEUE, which sets up a first-in first-out (FIFO) queue, of 
user-defined size Queue_Size to collect generated states. When a new state is generated, its parameters 
are tested, within specified resolutions of a user-defined array Queue_Resolution[]. When parameters sets 
are repeated within this queue, the saved value of the cost function is returned without having to repeat 
the calculation. 

3.4.6. Multiple Local Minima 

Our criteria for the global minimum of our cost function is minus the largest profit over a selected 
training data set (or in some cases, this value divided by the maximum drawdown). However, in many 
cases this may not give us the best set of parameters to find profitable trading in test sets or in real-time 
trading. Other considerations such as the total number of trades developed by the global minimum versus 
other close local minima may be relevant. For example, if the global minimum has just a few trades, 
while some nearby local minima (in terms of the value of the cost function) have many trades and was 
profitable in spite of our slippage factors, then the scenario with more trades might be more statistically 
dependable to deliver profits across testing and real-time data sets. 

Therefore, for the outer-shell global optimization of training sets, we have used an ASA OPTION, 
MULTI_MIN, which saves a user-defined number of closest local minima within a user-defined resolution 
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of the parameters. We then examine these results under several testing sets. 
4. TRADING SYSTEM 

4.1. UseofCMI 

As the CMI formalism carries the relevant information regarding the prices dynamics, we have used 
it as a signal generator for an automated trading system for S&P futures. 

Based on a previous work [30] applied to daily closing data, the overall structure of the trading 
system consists in 2 layers, as follows: We first construct the "short-time" Lagrangian function in the Ito 
representation (with the notation introduced in Section 3.3) 



1 fdFi p 



(27) 



with / the post-point index, corresponding to the one factor price model 

dF = fdt + af'dzit) , (28) 

where and cr > are taken to be constants, F{t) is the S&P future price, and dz is the standard 
Gaussian noise with zero mean and unit standard deviation. We perform a global, maximum likelihood fit 
to the whole set of price data using ASA. This procedure produces the optimization parameters {x, f^} 
that are used to generate the CMI. One computational approach was to fix the diffusion multiplier cr to 1 
during training for convenience, but used as free parameters in the adaptive testing and real-time fits. 
Another approach was to fix the scale of the volatility, using an improved model. 



dF = fdt + a 



<F> 



dz{t) , (29) 



where o now is calculated as the standard deviation of the price increments ^Fldt^''^, and < F > is just 
the average of the prices. 

As already remarked, to enhance the CMI sensitivity and response time to local variations (across a 
certain window size) in the distribution of price increments, the momenta are generated applying an 
adaptive procedure, i.e., after each new data reading another set of {/^, o} parameters are calculated for 
the last window of data, with the exponent x — a contextual indicator of the noise statistics — fixed to the 
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value obtained from the global fit. 

The CMI computed in this manner are fed into the outer shell of the trading system, where an AI- 
type optimization of the trading rules is executed, using ASA once again. 

The trading rules are a collection of logical conditions among the CMI, prices and optimization 
parameters that could be window sizes, time resolutions, or trigger thresholds. Based on the relationships 
between CMI and optimization parameters, a trading decision is made. The cost function in the outer 
shell is either the overall equity or the risk-adjusted profit (essentially the return). The inner and outer 
shell optimizations are coupled through some of the optimization parameters (e.g., time resolution of the 
data, window sizes), which justifies the recursive nature of the optimization. 

Next, we describe in more details the concrete implementation of this system. 
4.2. Data Processing 

The CMI formahsm is general and by construction permits us to treat multivariate coupled markets. 
In certain conditions (e.g., shorter time scales of data), and also due to superior scalability across different 
markets, it is desirable to have a trading system for a single instrument, in our case the S&P futures 
contracts that are traded electronically on Chicago Mercantile Exchange (CME). The focus of our system 
was intra-day trading, at time scales of data used in generating the buy/sell signals from 10 to 60 sees. In 
particular, we here give some results obtained when using data having a time resolution A? of 55 sees (the 
time between consecutive data elements is 55 sees). This particular choice of time resolution reflects the 
set of optimization parameters that have been applied in actual trading. 

It is important to remark that a data point in our model does not necessarily mean an actual tick 
datum. For some trading time scales and for noise reduction purposes, data is pre-processed into 
sampling bins of length using either a standard averaging procedure or spectral filtering (e.g., wavelets, 
Fourier) of the tick data. Alternatively, the data can be defined in block bins that contain disjoint sets of 
averaged tick data, or in overlapping bins of widths A? that update at every A?' < A?, such that an effective 
resolution A?' shorter than the width of the samphng bin is obtained. We present here work in which we 
have used disjoint block bins and a standard average of the tick data with time stamps falling within the 
bin width. 
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In Figs. 1 and 2 we present examples of S&P futures data sampled with 55 sees resolution. We 
remark that there are several time scales — from mins to one hour — at which an automated trading 
system might extract profits. Fig. 2 illustrates the sustained short trading region of 1.5 hours and several 
shorter long and short trading regions of about 10-20 mins. Fig. 1 illustrates that the profitable regions are 
prominent even for data representing a relatively flat market period. I.e., June 20 shows an uptrend region 
of about 1 hour 20 mins and several short and long trading domains between 10 mins and 20 mins. In 
both situations, there are a larger number of opportunities at time resolutions smaller than 5 mins. 

The time scale at which we sample the data for trading is itself a parameter that is extracted from 
the optimization of the trading rules and of the Lagrangian cost function Eq. (27). This is one of the 
couphng parameters between the iimer- and the outer-shell optimizations. 

4.3. Inner-Shell Optimization 

A cycle of optimization runs has three parts, training and testing, and finally real-time use — a 
variant of testing. Training consists in choosing a data set and performing the recursive optimization, 
which produces optimization parameters for trading. In our case there are six parameters: the time 
resolution Af of price data, the length of window W used in the local fitting procedures and in 
computation of moving averages of trading signals, the drift volatility coefficient cr and exponent x 
from Eq. (28), and a multiplicative factor M necessary for the trading rules module, as discussed below. 

The optimization parameters computed from the training set are applied then to various test sets and 
final profit/loss analysis are produced. Based on these, the best set of optimization parameters are chosen 
to be applied in real-time trading runs. We remark once again that a single training data set could support 
more than one profitable sets of parameters and can be a function of the trader's interest and the specific 
market dynamics targeted (e.g., short/long time scales). The optimization parameters corresponding to 
the global minimum in the training session may not necessarily represent the parameters that led to robust 
profits across real-time data. 

The training optimization occurs in two inter-related stages. An inner-shell maximum likelihood 
optimization over all training data is performed. The cost function that is fitted to data is the effective 
action constructed from the Lagrangian Eq. (27) including the pre-factors coming from the measure 
element in the expression of the short-time probability distribution Eq. (12). This is based on the fact [39] 
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that in the context of Gaussian multiplicative stochastic noise, the macroscopic transition probability 
P(F, t\F', t') to start with the price F' at f and reach the price F at ? is determined by the short-time 
Lagrangian Eq. (27), 



with dtj = ti - ti_i . Recall that the main assumption of our model is that price increments (or the 
logarithm of price ratios, depending on which variables are considered independent) could be described 
by a system of coupled stochastic, non-linear equations as in Eq. (10). These equations are deceptively 
simple in structure, yet depending on the functional form of the drift coefficients and the multiplicative 
noise, they could describe a variety of interactions between financial instruments in various market 
conditions (e.g., constant elasticity of variance model [53], stochastic volatility models, etc.). In 
particular, this type of models include the case of Black-Scholes price dynamics {x = 1). 

In the system presented here, we have apphed the model from Eq. (28). The fitted parameters were 
the drift coefficient and the exponent x. In the case of a coupled futures and cash system, besides the 
corresponding values of and x for the cash index, another parameter, the correlation coefficient p as 
introduced in Eq. (10), must be considered. 

4.4. TVading Rules (Outer-Shell) Recursive Optimization 

In the second part of the training optimization, we calculate the CMI and execute trades as required 
by a selected set of ttading rules based on CMI values, price data or combinations of both indicators. 

Recall that three external shell optimization parameters are defined: the time resolution A? of the 
data expressed as the time interval between consecutive data points, the window length W (in number of 

time epochs or data points) used in the adaptive calculation of CMI, and a numerical coefficient M that 
scales the momentum uncertainty discussed below. 

At each moment a local refit of and a over data in the local window W is executed, moving the 
window M across the training data set and using the zeroth order optimization parameters and x 
resulting from the inner-shell optimization as a first guess. It was found that a faster quasi-local code is 
sufficient for computational purposes for these adaptive updates. In more complicated models, ASA can 
be successfully apphed recursively, although in real-time trading the response time of the system is a 




(30) 
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major factor that requires attention. 

All expressions that follow can be generalized to coupled systems in the manner described in 
Section 3. Here we use the one factor nonlinear model given by Eq. (28). At each time epoch we 
calculate the following momentum related quantities: 

F_ 1 (dF ^f] 
J 



n = f 



rF 



Mf =<{lf-<lf >f>^^^= (31) 

where we have used < >= as implied by Eqs. (28) and (27). In the previous expressions, is the 
CMI, rio is the neutral line or the momentum of a zero change in prices, and AII^ is the uncertainty of 
momentum. The last quantity reflects the Heisenberg principle, as derived from Eq. (28) by calculating 

^F=< (dF- < dF >f >^'^ = aF^^dt , 

An^ ^F> \ , (32) 

where all expectations are in terms of the exact noise distribution, and the calculation implies the Ito 
approximation (equivalent to considering non-anticipative functions). Various moving averages of these 
momentum signals are also constructed. Other dynamical quantities, as the Hamiltonian, could be used as 
well. (By analogy to the energy concept, we found that the Hamiltonian carries information regarding the 
overall trend of the market, giving another useful measure of price volatihty.) 

Regarding the practical implementation of the previous relations for trading, some comments are 
necessary. In terms of discretization, if the CMI are calculated at epoch /, then i/F, = F, - 
dti = ti - ti-i = At, and all prefactors are computed at moment i-l by the Ito prescription (e.g., 
aF^ = aFf_{). The momentum uncertainty band AII^ can be calculated from the discretized theoretical 
value Eq. (31), or by computing the estimator of the standard deviation from the actual time series of 11^. 

There are also two ways of calculating averages over CMI values: One way is to use the set of local 
optimization parameters {/^, a} obtained from the local fit procedure in the current window W for all 
CMI data within that window (local-model average). The second way is to calculate each CMI in the 



Optimization of Trading 



-22- 



Ingber & Mondescu 



current local window W with another set {f'^,cT} obtained from a previous local fit window measured 
from the CMI data backwards W points (multiple-models averaged, as each CMI corresponds to a 
different model in terms of the fitting parameters {/^, a}). 

The last observation is that the neutral line divides all CMI in two classes: long signals, when 
> YIq, as any CMI satisfying this condition indicates a positive price change, and short signals when 
< YIq, which reflects a negative price change. 

After the CMI are calculated, based on their meaning as statistical momentum indicators, trades are 
executed following a relatively simple model: Entry in and exit from a long (short) trade points are 
defined as points where the value of CMIs is greater (smaller) than a certain fraction of the uncertainty 
band MAII^ (-MAII^), where M is the multiplicative factor mentioned in the begiiming of this 
subsection. This is a choice of a symmetric trading rule, as M is the same for long and short trading 
signals, which is suitable for volatile markets without a sustained trend, yet without diminishing too 
severely profits in a strictly bull or bear region. 

Inside the momentum uncertainty band, one could define rules to stay in a previously open trade, or 
exit immediately, because by its nature the momentum uncertainty band implies that the probabilities of 
price movements in either direction (up or down) are balanced. From another perspective, this type of 
trading rule exploits the relaxation time of a strong market advance or decUne, until a trend reversal 
occurs or it becomes more probable. 

Other sets of trading rules are certainly possible, by utilizing not only the current values of the 
momenta indicators, but also their local-model or multiple-models averages. A trading rule based on the 
maximum distance between the current CMI data nf and the neutral line IIq shows faster response to 
markets evolution and may be more suitable to automatic trading in certain conditions. 

Stepping through the trading decisions each trading day of the training set determined the 
profit/loss of the training set as a single value of the outer-sell cost function. As ASA importance- 
sampled the outer-shell parameter space {At, W, M}, these parameters are fed into the inner shell, and a 
new inner-shell recursive optimization cycle begins. The final values for the optimization parameters in 
the training set are fixed when the largest net profit (calculated from the total equity by subtracting the 
transactions costs defined by the slippage factor) is realized. In practice, we have collected optimization 
parameters from multiple local minima that are near the global minimum (the outer-shell cost function is 
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defined with the sign reversed) of the training set. 

The values of the optimization parameters {At, W, M, f^, a, x) resulting from a training cycle are 
then applied to out-of-sample test sets. During the test run, the drift coefficient and the volatility 
coefficient o are refitted adaptively as described previously. All other parameters are fixed. We have 
mentioned that the optimization parameters corresponding to the highest profit in the training set may not 
be the sufficiently robust across test sets. Then, for all test sets, we have tested optimization parameters 
related to the multiple minima (i.e., the global maximum profit, the second best profit, etc.) resulting from 
the training set. 

We performed a bootstrap-type reversal of the training-test sets (repeating the training runs 
procedures using one of the test sets, including the previous training set in the new batch of test sets), 
followed by a selection of the best parameters across all data sets. This is necessary to increase the 
chances of successful trading sessions in real-time. 

5. RESULTS 

5.1. Alternative Algorithms 

In the previous sections we noted that there are different combinations of methods of processing 
data, methods of computing the CMI and various sets of trading rules that need to be tested — at least in a 
sampling manner — before launching trading runs in real-time: 

1 . Data can be preprocessed in block or overlapping bins, or forecasted data derived from the most 
probable transition path [41] could be used as in one of our most recent models. 

2. Exponential smoothing, wavelets or Fourier decomposition can be applied for statistical 
processing. We presently favor exponential moving averages. 

3. The CMI can be calculated using averaged data or directly with tick data, although the 

optimization parameters were fitted from preprocessed (averaged) price data. 

4. The trading rules can be based on current signals (no average is performed over the signal 
themselves), on various averages of the CMI trading signals, on various combination of CMI data 
(momenta, neutral line, uncertainty band), on symmetric or asymmetric trading rules, or on mixed price- 
CMI trading signals. 
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5. Different models (one and two-factors coupled) can be applied to the same market instrument, 
e.g., to define complementary indicators. 

The selection process evidently must consider many specific economic factors (e.g., liquidity of a 
given market), besides all other physical, mathematical and technical considerations. In the work 
presented here, as we tested our system and using previous experience, we focused toward S&P500 
futures electronic trading, using block processed data, and symmetric, local-model and multiple-models 
trading rules. In Table I we show results obtained for several training and testing sets in the mentioned 
context. 

5.2. TVading System Design 

The design of a successful electronic trading system is complex as it must incorporate several 
aspects of a trader's actions that sometimes are difficult to translate into computer code. Three important 
features that must be implemented are factoring in the transactions costs, devising money management 
techniques, and coping with execution deficiencies. 

Generally, most trading costs can be included under the "slippage factor," although this could easily 
lead to poor estimates. Given that the margin of profits from exploiting market inefficiencies are thin, a 
high shppage factor can easily result in a non-profitable trading system. In our situation, for testing 
purposes we used a $35 sHppage factor per buy & sell order, a value we believe is rather high for an 
electronic trading environment, although it represents less than three ticks of a mini-S&P futures contract. 
(The mini-S&P is the S&P futures contract that is traded electronically on CME.) This higher value was 
chosen to protect ourselves against the bid-ask spread, as our trigger price (at what price the CMI was 
generated) and execution price (at what price a trade signaled by a CMI was executed) were taken to be 
equal to the trading price. (We have changed this aspect of our algorithm in later models.) The sHppage 
is also strongly influenced by the time resolution of the data. Although the slippage is linked to bid-ask 
spreads and markets volatility in various formulas [54], the best estimate is obtained from experience and 
actual trading. 

Money management was introduced in terms of a traiUng stop condition that is a function of the 

price volatility, and a stop-loss threshold that we fixed by experiment to a multiple of the mini-S&P 
contract value ($200). It is tempting to tighten the trailing stop or to work with a small stop-loss value. 
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yet we found — as otherwise expected — that higher losses occurred as the signals generated by our 
stochastic model were bypassed. 

Regarding the execution process, we have to account for the response of the system to various 
execution conditions in the interaction with the electronic exchange: partial fills, rejections, uptick rule 
(for equity trading), etc. Except for some special conditions, all these steps must be automated. 

5.3. Some Explicit Results 

Typical CMI data in Figs. 3 and 4 (obtained from real-time trading after a full cycle of training- 
testing was performed) are related to the price data in Figs. 1 and 2. We have plotted the fastest (55 sees 
apart) CMI values n^, the neutral line Ilo and the uncertainty band An^. All CMI data were produced 
using the optimization parameters set {55 sees, 88 epochs, 0. 15} of the second-best net profit obtained 
with the ttaining set "4D ESMO 0321-0324" (Table 1). 

Although the CMIs exhibit an inherently ragged nature and oscillate around a zero mean value 
within the uncertainty band — the width of which is decreasing with increasing price volatility, as the 
uncertainty principle would also indicate — time scales at which the CMI average or some persistence 
time are not balanced about the neutral line. 

These characteristics, which we try to exploit in our system, are better depicted in Figs. 5 and 6. 
One set of trading signals, the local-model average of the neutral line < Ilo > th^ uncertainty band 
multiplied by the optimization factor M = 0. 15, and centered around the theoretical zero mean of the 
CMI, is represented versus time. Note entry points in a short trading position (< Hq > > M AII^) at 
around 10:41 (Fig. 5 in conjunction with S&P data in Fig. 1) with a possible exit at 1 1:21 (or later), and a 
first long entry (< Ilo > < - M AII^) at 12:15. After 14:35, a stay long region appears (< IIq > < 0), 
which indicates correctly the price movement in Fig.l. 

In Fig. 6 corresponding to June 22 price data from Fig. 2, a first long signal is generated at around 
12:56 and a first short signal is generated at 14:16 that reflects the long downtrend region in Fig. 2. Due 
to the averaging process, a time lag is introduced, reflected by the long signal at 12:56 in Fig. 4, related to 
a past upward trend seen in Fig. 2; yet the neutral line relaxes rather rapidly (given the 55 sec time 
resolution and the window of 88 ~ 1 .5 hour) toward the uncertainty band. A judicious choice of trading 
rules, or avoiding standard averaging methods, helps in controlling this lag problem. 
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In Tables 1 and 2 we show some results obtained for several training and testing sets following the 
procedures described at the end of the previous section. In both tables, under the heading "Training" or 
"Testing Set" we specify the data set used (e.g., "4D ESMO 0321-0324" represents four days of data from 
the mini-S&P futures contract that expired in June). The type of trading rules used is identified by 
"LOCAL MODEL" or "MULTIPLE MODELS" tags. These tags refer to how we calculate the averages 
of the trading signals: either by using a single pair of optimization parameters {/^, cr} for all CMI data 
within the current adaptive fit window, or a different pair {/^, a} for each CMI data. In the "Statistics" 
column we report the net (subtracting the slippage) profit or loss (in parenthesis) across the whole data 
set, the total number of trades ("trades"), the number of days with positive balance ("days +"), and the 
percentage of winning trades ("winners"). The "Parameters" are the optimization parameters resulting 
from the first three best profit maxima of each listed training set. The parameters are listed in the order 
{At, W, M }, with the data time resolution At measured in seconds, the length of the local fit window W 
measured in time epochs, and M the numerical coefficient of the momentum uncertainty band AII^. 

Recall that the trading rules presented are symmetric (the long and short entry/exit signals are 
controlled by the same M factor), and we apply a stay-long condition if the neutral-line is below the 
average momentum < 11^ >= and stay-short if IIq > 0. The drift and volatility coefficient <t are 
refitted adaptively and the exponent x is fixed to the value obtained in the training set. Typical values are 
/^e ± [0. 003: 0. 05], xe ± [0. 01: 0. 03]. During the local fit, due to the shorter time scale involved, the 
drift may increase by a factor of ten, and <t g [0. 01 : 1 . 2]. 

Comparing the data in the training and testing tables, we note that the most robust optimization 
factors — in terms of maximum cumulative profit resulted for all test sets — do not correspond to the 
maximum profit in the training sets: For the local-model rules, the optimum parameters are 
{55, 88, 0. 15}, and for the multiple models rules the optimum set is {45, 72, 0. 2}, both realized by the 
training set "4D ESMO 0321-0324." 

Other observations are that, for the data presented here, the multiple-models averages trading rules 
consistently performed better and are more robust than the local-model averages trading rules. The 
number of trades is similar, varying between 15 and 35 (eliminating cumulative values smaller than 10 
trades), and the time scale of the local fit is rather long in the 30 mins to 1.5 hour range. In the current 
set-up, this extended time scale implies that is advisable to deploy this system as a trader-assisted tool. 
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An important factor is the average length of the trades. For the type of rules presented in this work, 
this length is of several minutes, up to one hour, as the time scale of the local fit window mentioned above 
suggested. 

Related to the length of a trade is the length of a winning long/short trade in comparison to a losing 
long/short trade. Our experience indicates that a ratio of 2: 1 between the length of a winning trade and 
the length of a losing trade is desirable for a reliable trading system. Here, using the local-model trading 
rules seems to offer an advantage, although this is not as clear as one would expect. 

Finally, the training sets data (Table 1) show that the percentage of winners is markedly higher in 
the case of multiple-models average than local-average trading rules. In the testing sets (Table 2) the 
situation is almost reversed, albeit the overall profits (losses) are higher (smaller) in the multiple-model 
case. Apparently, the multiple-model trading rules can stay in winning trades longer to increase profits, 
relative to losses incurred with these rules in losing trades. (In the testing sets, this correlates with the 
higher number of trades executed using local-model trading rules.) 

6. CONCLUSIONS 

6.1. Main Features 

The main stages of building and testing this system were: 

1. We developed a multivariate, nonlinear statistical mechanics model of S&P futures and cash 
markets, based on a system of coupled stochastic differential equations. 

2. We constructed a two-stage, recursive optimization procedure using methods of ASA global 
optimization: An inner-shell extracts the characteristics of the stochastic price distribution and an outer- 
shell generates the technical indicators and optimize the trading rules. 

3. We trained the system on different sets of data and retained the multiple minima generated 
(corresponding to the global maximum net profit reaHzed and the neighboring profit maxima). 

4. We tested the system on out-of-sample data sets, searching for most robust optimization 
parameters to be used in real-time trading. Robustness was estimated by the cumulative profit/loss across 
diverse test sets, and by testing the system against a bootstrap-type reversal of training-testing sets in the 
optimization cycle. 
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Modeling the market as a dynamical physical system makes possible a direct representation of 
empirical notions as market momentum in terms of CMI derived naturally from our theoretical model. 
We have shown that other physical concepts as the uncertainty principle may lead to quantitative signals 
(the momentum uncertainty band AII^) that captures other aspects of market dynamics and which can be 
used in real-time trading. 

6.2. Summary 

We have presented a description of a trading system composed of an outer-shell trading-rule model 
and an inner-shell nonlinear stochastic dynamic model of the market of interest, S&P500. The inner-shell 
is developed adhering to the mathematical physics of multivariate nonlinear statistical mechanics, from 
which we develop indicators for the trading-rule model, i.e., canonical momenta indicators (CMI). We 
have found that keeping our model faithful to the underlying mathematical physics is not a limiting 
constraint on profitability of our system; quite the contrary. 

An important result of our work is that the ideas for our algorithms, and the proper use of the 
mathematical physics faithful to these algorithms, must be supplemented by many practical 
considerations en route to developing a profitable trading system. For example, since there is a subset of 
parameters, e.g., time resolution parameters, shared by the inner- and outer-shell models, recursive 
optimization is used to get the best fits to data, as well as developing multiple minima with approximate 
similar profitabihty. The multiple minima often have additional features requiring consideration for real- 
time trading, e.g., more trades per day increasing robustness of the system, etc. The nonlinear stochastic 
nature of our data required a robust global optimization algorithm. The output of these parameters from 
these training sets were then applied to testing sets on out-of- sample data. The best models and 
parameters were then used in real-time by traders, further testing the models as a precursor to eventual 
deployment in automated electronic trading. 

We have used methods of statistical mechanics to develop our inner-shell model of market 
dynamics and a heuristic AI type model for our outer-shell trading-rule model, but there are many other 
candidate (quasi-)global algorithms for developing a cost function that can be used to fit parameters to 
data, e.g., neural nets, fractal scaling models, etc. To perform our fits to data, we selected an algorithm. 
Adaptive Simulated Annealing (ASA), that we were familiar with, but there are several other candidate 
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algorithms that likely would suffice, e.g., genetic algorithms, tabu search, etc. 

We have shown that a minimal set of trading signals (the CMI, the neutral line representing the 
momentum of the trend of a given time window of data, and the momentum uncertainty band) can 
generate a rich and robust set of trading rules that identify profitable domains of trading at various time 
scales. This is a confirmation of the hypothesis that markets are not efficient, as noted in other 
studies [11,30,55]. 

6.3. Future Directions 

Although this paper focused on trading of a single instrument, the futures S&P 500, the code we 
have developed can accommodate trading on multiple markets. For example, in the case of tick- 
resolution coupled cash and futures markets, which was previously prototyped for inter-day 
trading [29,30], the utility of CMI stems from three directions: 

(a) The iimer-shell fitting process requires a global optimization of all parameters in both futures 
and cash markets. 

(b) The CMI for futures contain, by our Lagrangian construction, the coupling with the cash market 
through the off-diagonal correlation terms of the metric tensor. The correlation between the futures and 
cash markets is explicitly present in all futures variables. 

(c) The CMI of both markets can be used as complimentary technical indicators for trading in 
futures market. 

Several near term future directions are of interest: orienting the system toward shorter trading time 
scales (10-30 sees) more suitable for electronic trading, introducing fast response "averaging" methods 
and time scale identifiers (exponential smoothing, wavelets decomposition), identifying mini-crashes 
points using renormalization group techniques, investigating the use of CMI in pattern-recognition based 
trading rules, and exploring the use of forecasted data evaluated from most probable transition path 
formalism. 

Our efforts indicate the invaluable utility of a joint approach (Al-based and quantitative) in 
developing automated trading systems. 
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6.4. Standard Disclaimer 

We must emphasize that there are no claims that all results are positive or that the present system is 
a safe source of riskless profits. There as many negative results as positive, and a lot of work is necessary 
to extract meaningful information. Our purpose here is to describe an approach to developing an 
electronic trading system complementary to those based on neural-networks type technical analysis and 
pattern recognition methods. The system discussed in this paper is rooted in the physical principles of 
nonequilibrium statistical mechanics, and we have shown that there are conditions under which such a 
model can be successful. 
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FIGURE CAPTIONS 

Figure 1. Futures and cash data, contract ESUO June 20: solid line — futures; dashed line — cash. 

Figure 2. Futures and cash data, contract ESUO June 22: solid line — futures; dashed line — cash. 

Figure 3. CMI data, real-time trading June 20: sohd line — CMI; dashed line — neutral line; 
dotted hne — uncertainty band. 

Figure 4. CMI data, real-time trading, June 22: solid line — CMI; dashed line — neutral line; 
dotted hne — uncertainty band. 

Figure 5. CMI trading signals, real-time trading June 20: dashed hne — local-model average of the 
neutral line; dotted line — uncertainty band multiplied by the optimization parameter M = 0. 15. 

Figure 6. CMI trading signals, real-time trading June 22: dashed hne — local-model average of the 
neutral line; dotted line — uncertainty band multiphed by the optimization parameter M = 0. 15. 
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TABLE CAPTIONS 

Table 1. Mati"ix of Training Runs. 
Table 2. Mati'ix of Testing Runs. 
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Canonical Momenta Indicators (CMI) 
time resolution = 55 sees 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



T 



T 



0.5 - 



/ \ 



<n Q> (local) 

MAn'' 



I V/ V 



o 



-0.5 - 



1) / 

: III I 

U .../ 

'I f 

I I 

I I 

I / 

I / 
^/ 



/ y I 

' ' ' 
I I I 

I lU I 

' III) l\( I 

' llll 1 1 

llll I \ - 

Ifl I .u 

It - I ■ 

■B f ' 

II .J 1. 

I ' I il 
I Ij 



l' 1 . 



1 



1 



1 



I I I I I I I I I I I I I 



1 



1 



1 



1 



06-20 10:46:16 06-20 11:45:53 06-20 12:45:30 06-20 13:45:07 06-20 14:44:44 



TIME (mm-dd hh-mm-ss) 



Optimization of Trading 



- Figure 6 - 



Ingber & Mondescu 



Canonical Momenta Indicators 
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TRAINING SET 


TRADING RULES 


STATISTICS 


PARAMETERS {htW M) 


4D ESMO 0321-0324 


LOCAL MODEL 


Parameters 
$ profit (loss) 

# trades 

# days + 
% winners 


55 90 0.125 
1390 
16 

3 

75 


55 88 0.15 
1215 
16 

3 

75 


60 40 0.275 
1167 
17 

3 

76 


MULTIPLE MODELS 


Parameters 
$ profit (loss) 

# trades 

# days + 
% winners 


45 76 0.175 
2270 
18 
4 
83 


45 72 0.20 
2167.5 
17 
4 
88 


60 59 0.215 
1117.5 

17 
3 

76 


5D ESMO 0327-0331 


LOCAL MODEL 


Parameters 
$ profit (loss) 

# trades 

# days + 
% winners 


20 22 0.60 
437 
15 

3 

67 


20 24 0.55 
352 
16 

3 

63 


10 54 0.5 
(35) 
1 




MULTIPLE MODELS 


Parameters 
$ profit (loss) 

# trades 

# days + 
% winners 


45 74 0.25 
657.5 
3 
5 

100 


40 84 0.175 
635 
19 
3 

68 


30 1100.15 
227.5 
26 
2 
65 


5D ESMO 0410-0414 


LOCAL MODEL 


Parameters 
$ profit (loss) 

# trades 

# days + 
% winners 


50 102 0.10 
1875 
35 

3 

60 


50 142 0.10 
1847 
19 

3 

58 


35 142 0.10 
1485 
34 
4 
62 


MULTIPLE MODELS 


Parameters 
$ profit (loss) 

# trades 

# days + 
% winners 


45 46 0.25 

2285 
39 
3 

72 


40 48 0.30 

2145 
23 
3 

87 


60 34 0.30 

1922.5 
29 
3 

72 



Optimization of Trading 



- Table 2 - 



Ingber & Mondescu 







PARAMETERS {^tW M) 


TESTING SETS 


STATISTICS 


LOCAL MODEL 


MULTIPLE MODELS 






55 90 0.125 


55 88 0.15 


60 40 0.275 


45 76 0.175 


45 72 0.20 


60 59 0.215 


5D ESMO 0327-0331 


$ profit (loss) 


(712) 


(857) 


(1472) 


(605) 


(220) 


(185) 




# trades 


20 


17 


16 


18 


12 


11 




# days + 


2 


2 


1 


3 


1 


1 




% winners 


50 


47 


44 


67 


67 


54 


4D ESMO 0403-0407 


$ profit (loss) 


(30) 


258 


602 


1340 


2130 


932 




# trades 


18 


13 


16 


16 


17 


13 




# days + 


3 


3 


2 


1 


1 


1 




% winners 


56 


54 


56 


50 


53 


38 


5D ESMO 0410-0414 


$ profit (loss) 


750 


1227 


(117) 


(530) 


(1125) 


(380) 




# trades 


30 


21 


23 


23 


20 


18 




# days + 


3 


3 


3 


2 


2 


3 




% winners 


60 


62 


48 


48 


50 


50 



